Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed messages are repeated on page refresh & process queue only when client is leader #2703

Merged
merged 14 commits into from
Aug 13, 2021

Conversation

parasharrajat
Copy link
Member

@parasharrajat parasharrajat commented May 5, 2021

Please review @marcaaron.

Details

  1. This PR tries to fix the duplicate request issue from the Web platform.
  2. On the web, a user can open multiple tabs for E.cash. And all of these clients are able to send requests.
  3. But the User can go offline any time. we have a mechanism to handle requests when the user is offline.
  4. Currently there is an issue that these persisted requests are duplicated when the user comes back online.
  5. Each client(tab) has an in-memory network queue. and We have global NETWORK_REQUEST_QUEUE for persisted requests.
  6. When a user is offline all the clients will push their requests to an in-memory network queue, from which we filter all the retriable requests and push those to our persisted queue. We filter the in-memory queue and remove these persisted requests that are stored in the persisted queue.
  7. Due to Onyx, this persisted queue is synced across the clients.
  8. When the user comes back online, we ask the leader client to process all the persisted requests from each client. Thus no request will be duplicated. After all the requests are sent to the server, messages will be sync across tabs via Onyx, keeping the same report actions for all clients.

Fixed Issues

Fixes #2451
Fixes #4214

Tests / QA Steps

  1. Log in to staging.expensify.cash
  2. Navigate to a conversation
  3. Disable the internet connection
  4. Send a message while offline
  5. Enable the internet connection and immediately refresh the page.
  6. Messages should not be repeated.

Tested On

  • Web (Affected Platfrom)
  • Mobile Web
  • Desktop
  • iOS
  • Android

Screenshots

Web

Before
OFFLINE.mp4

After

new.mp4

@parasharrajat parasharrajat requested a review from a team as a code owner May 5, 2021 19:42
@MelvinBot MelvinBot requested review from joelbettner and removed request for a team May 5, 2021 19:43
let isInit = false;

// Are we ready to determine active Client?
const isReady = new Promise((res) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are needed as the Client manager incorrectly reports the leadership of the client

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe elaborate on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure. There are cases where we run important callbacks once but It is possible that the client is not set before that callback call. Thus a promise to make sure that we run the callback. We have to use a flag and combination of Promises to achieve this.
as

function init() {
    Onyx.merge(ONYXKEYS.ACTIVE_CLIENTS, [clientID]);
    isInit = true;
}

The above method does not really initialize the client it just request Onyx to set the client id. merge Ops is async and I think it does not return thenable callback thus we have to go to connect call.

 if (isInit) {
            setReady();
        }

Now we mark is ready by resolving the promise as we have set the clientId on the client stack. This connect could fire multiple times, we only resolve the promise when isInit is set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to wait until the client is set to do the rest of the logic here?

One idea would be to add an onReady function to ActiveClientsManager so that we can register a callback then wait to do the Onyx.connect() below?

I think it might make it more obvious that we are waiting for the active client to be set

ActiveClientsManager.onReady(() => {
    Onyx.connect({
        key: ONYXKEYS.NETWORK_REQUEST_QUEUE,
        callback: ...
    })
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok Thanks I am seeing the problem now. And yeah, Onyx being asynchronous is making this difficult once again 😓

Copy link
Contributor

@marcaaron marcaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @parasharrajat, Can we try to document this code better and add some comments? Overall, I'm confused by the implementation and unsure why we're doing most of what we're doing.

let isInit = false;

// Are we ready to determine active Client?
const isReady = new Promise((res) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe elaborate on this?

src/libs/ActiveClientManager/index.js Outdated Show resolved Hide resolved

let isQueuePaused = false;

// Are we retrying requests when user comes back online?
let retryRequestsAdded = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above doesn't really help explain what the code is doing or how it works.

src/libs/Network.js Outdated Show resolved Hide resolved
return;
}
ActiveClientManager.isReady.then(() => {
// Only process queue when client is Leader and we have not retried the requests already
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we only process the queue when the client is the leader? (Put it in the comment)

@@ -112,6 +122,7 @@ function processNetworkRequestQueue() {
!request.data.doNotRetry && request.data.persist
));
Onyx.set(ONYXKEYS.NETWORK_REQUEST_QUEUE, retryableRequests);
retryRequestsAdded = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set this flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcaaron. We have a queue of requests as networkRequestQueue. we take retry requests and save them in ONYXKEYS.NETWORK_REQUEST_QUEUE in case the user closes the tab or refreshes it. Now let's say the user's internet is back but he didn't refresh the tab so networkRequestQueue will continue processing all those requests. This flag records this behavior. so when this is true we know we have processed the requests and clear the requests.

 // User came back online and the request queue will resume, thus we should clear the NETWORK_REQUEST_QUEUE
    if (ActiveClientManager.isClientTheLeader() && retryRequestsAdded) {
        Onyx.set(ONYXKEYS.NETWORK_REQUEST_QUEUE, []);
    }

Also Onyx.set(ONYXKEYS.NETWORK_REQUEST_QUEUE, retryableRequests);. We reset the queue here but what will happen if the user is sending requests from multiple tabs. We will overwrite the old requests. Do we want this to behave like this or we should merge those requests Onyx.merge?

@@ -160,6 +171,11 @@ function processNetworkRequestQueue() {
.catch(error => onError(queuedRequest, error));
});

// User came back online and the request queue will resume, thus we should clear the NETWORK_REQUEST_QUEUE
if (ActiveClientManager.isClientTheLeader() && retryRequestsAdded) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little lost on why we only want to clear the queue when the client is the leader and there is also no explanation about the significance of retryRequestsAdded. Can you try explaining this better in the comments?

// User came back online

How exactly does this tell us the user came back online?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So When a user goes offline we filter the network request queue and put the retrievable requests on persisted Queue. But it is possible that a user can come back online without refreshing the page. Now those retriable requests will process as a part of networkRequestQueue. Thus we don't want to process the persisted queue ONYXKEYS.NETWORK_REQUEST_QUEUE. It should only run when the user refresh the page.

Copy link
Member Author

@parasharrajat parasharrajat May 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only allow the leader to handle the ONYXKEYS.NETWORK_REQUEST_QUEUE as persisted requests shared by tabs and make sure that we only clear the queue when they were processed via normal networkRequestQueue. As we have processed the normal queue, now we will not need them to be persisted thus clear them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it's still a little unclear to me, but I might have to just look harder at this and try to offer some better suggestions.

@parasharrajat
Copy link
Member Author

I will comment on everything as requested.

let isInit = false;

// Are we ready to determine active Client?
const isReady = new Promise((res) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to wait until the client is set to do the rest of the logic here?

One idea would be to add an onReady function to ActiveClientsManager so that we can register a callback then wait to do the Onyx.connect() below?

I think it might make it more obvious that we are waiting for the active client to be set

ActiveClientsManager.onReady(() => {
    Onyx.connect({
        key: ONYXKEYS.NETWORK_REQUEST_QUEUE,
        callback: ...
    })
});

let setReady = null;

// Whether the client Manager has started
let isInit = false;
Copy link
Contributor

@marcaaron marcaaron May 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead could we have something like:

shouldFireOnReadyCallback and didInitialize and then do

if (shouldFireOnReadyCallback && !didInitialize) {
    onReadyCallback();
    didInitialize = true;
}

let isInit = false;

// Are we ready to determine active Client?
const isReady = new Promise((res) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok Thanks I am seeing the problem now. And yeah, Onyx being asynchronous is making this difficult once again 😓

src/libs/Network.js Outdated Show resolved Hide resolved
@@ -160,6 +171,11 @@ function processNetworkRequestQueue() {
.catch(error => onError(queuedRequest, error));
});

// User came back online and the request queue will resume, thus we should clear the NETWORK_REQUEST_QUEUE
if (ActiveClientManager.isClientTheLeader() && retryRequestsAdded) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it's still a little unclear to me, but I might have to just look harder at this and try to offer some better suggestions.

@parasharrajat
Copy link
Member Author

@marcaaron So I found one more side effect so I changed the logic completely.

Now the code completely rely on persisted queue. let me know what do you think of it. I would be happy to explain everything.

Due to Onyx.connect is called when user comes back online, we can just use the persisted queue and new logic is build up on that.

*
* @returns {Promise<void>}
*/
function isReady() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way we don't need to keep the promise in memory. I used Promise over your suggested callback method onReady(() => {}). We can just use promise in multiple places for multiple tasks. The callback would require an array to be implemented for that purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I want to do it the way I've suggested for consistency reasons. It's not that your idea won't work, it's that we don't have many patterns like this and it's pretty unusual to see.

We can just use promise in multiple places for multiple tasks. The callback would require an array to be implemented for that purpose.

There is no need to do this as only one thing needs to know whether we have "initialized".

*
* @returns {Promise<void>}
*/
function isReady() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I want to do it the way I've suggested for consistency reasons. It's not that your idea won't work, it's that we don't have many patterns like this and it's pretty unusual to see.

We can just use promise in multiple places for multiple tasks. The callback would require an array to be implemented for that purpose.

There is no need to do this as only one thing needs to know whether we have "initialized".

if (didLoadPersistedRequests || !persistedRequests) {
return;
}
ActiveClientManager.isReady().then(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I will repeat that I would prefer that we not do the check this way.


// If we have a request then we need to check if it can be persisted in case we close the tab while offline.
// We filter persisted requests from the normal Queue to remove duplicates
networkRequestQueue = _.reject(networkRequestQueue, (request) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We filter persisted requests from the normal Queue to remove duplicates

Which duplicates are we talking about here? Can you provide an example? It's hard to visualize what you mean or why we are removing these from the normal request queue in memory.

Wouldn't this mean that if I add a comment while offline that it will be "removed" from the queue and not sent again when I come back online?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Onyx.connect is called for NETWORK_REQUEST_QUEUE when we come back online which takes care of those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link me to where this happens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

Copy link
Member Author

@parasharrajat parasharrajat Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user comes online on line 37 we process the queue by fake merge which I will optimize based on the suggestion.

src/libs/Network.js Outdated Show resolved Hide resolved
@parasharrajat
Copy link
Member Author

Sorry. Posting it here.

I found the following issues.

  1. Messages that were repeated were on the same tab. (The first solution fix this part)
  2. I test messages from multiple tabs. But it was not covered so I pushed a new implementation that covers this as well. I think this one is simpler ad more clear.

What the code is doing?

OLD Architecture

####Single Tab(Client) case

When a user is offline and we get requests, We take retry requests from the normal queue and push them to Persisted Queue(We have to). We keep the copy of those requests in the normal queue as well. Thus when we get back online. requests will be sent from Persisted Queue & normal queue.(duplicates)

Multi Client Case

We get requests while offline, Single Tab(Client) case will be repeated for all tabs. But Persisted Queue is shared among tabs so we have multi-tab duplicated requests now. With respect to this, due to when we push to Persisted queue. Onyx.connect will be called and it will push the requests to each tabs normal queue for the first time then didLoadPersistedRequests will block further calls.

So Recent changes change the architecture in the following way

Proposed Architecture

  1. We receive requests when the user is offline, we push retry requests to persisted queue and remove them from the normal queue. Now Onyx.connect will be called for all tabs but isOffline check will block that. For all tabs same behaviour will happen. and Persisted queue has single instance of all the requests from all tabs.

  2. Now the internet is back, Onyx.connect is called for all tabs but only the leader client will push those requests in the networkqueue thus no duplicates.

OR

  1. If the user closed the tab, we still have those requests on Persisted Queue, Onyx.connect will be called for all tabs, Again the only leader will process them.

I think it is a simpler approach and more cleaner but I could be missing something. Otherwise, I don't think fixing the linked issue considering single tab behavior is good it would be temporary. We can wait until we are looking at the whole scenario.

@marcaaron
Copy link
Contributor

Hey @parasharrajat can you please summarize the status of this PR and linked issue? Thanks.

@parasharrajat
Copy link
Member Author

I will look into again and try to cross check everything. Also I will update the Promise to what you have suggested.

Overall, my solution would be based on the explanation above.

@joelbettner
Copy link
Contributor

gentle bump

@parasharrajat
Copy link
Member Author

Thanks for the Bump. I will try to check it off from my list this weekend. I need to rework here as there have been numerous changes since the PR was created.

@parasharrajat
Copy link
Member Author

@joelbettner I have updated the PR and tested it.
@marcaaron I have a concern about your request of using callback instead of Promise to check ActiveClientManager readiness.
There are two scenes.
It is not deterministic when the client will become active. It is possible that the client could be ready already when we register a callback. or another case would be, that client is not yet ready when we register the callback.

For the first case, we need to fire the callback as soon as we register. thus it will require another flag. I believe that the promise-based approach is neater and it allows multiple handlers. Let me know if you want me to do it in a callback way. Thanks.

Also I commented the code. Let me know if some thing is not clear. There is small explanation in the issue description as well.

@marcaaron
Copy link
Contributor

Just leaving a comment that I won't be able to help out with this review since there are higher priority items for me to get to at the moment. If you need help please ask in the Slack channel. Also curious if @TomatoToaster wants to check this PR out since he recently edited these files and might understand how this stuff works.

@marcaaron marcaaron dismissed their stale review June 21, 2021 17:27

I can't review this right now

@TomatoToaster TomatoToaster self-requested a review June 22, 2021 18:28
Copy link
Contributor

@TomatoToaster TomatoToaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi I just had some questions for clarifying. Just making sure I understand how this works correctly.

src/libs/Network.js Outdated Show resolved Hide resolved
src/libs/ActiveClientManager/index.js Outdated Show resolved Hide resolved
src/libs/Network.js Show resolved Hide resolved
@parasharrajat
Copy link
Member Author

Hi @TomatoToaster, I will get back to you with answers on these But I am sure about that flag that it is needed.

@parasharrajat
Copy link
Member Author

Does anyone feel that my solution is worth merging by doing some changes and refactor, then please do let me know? otherwise, this PR can be closed. Thanks.

@tgolen
Copy link
Contributor

tgolen commented Aug 2, 2021 via email

@parasharrajat
Copy link
Member Author

@tgolen any thoughts?

@tgolen
Copy link
Contributor

tgolen commented Aug 3, 2021

Nope, I still have other things on my list to get to before I look at this today. Thanks for your patience!

src/libs/Network.js Outdated Show resolved Hide resolved
// b) User is online.
// c) requests are not already loaded,
// d) When there is at least one request
if (!ActiveClientManager.isClientTheLeader()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still a big fan of this suggestion based on the following arguments:

  • There is no hint anywhere that this method must be run after ActiveClientManager.isReady()
  • If isClientTheLeader() is ever called without .isReady() then it's always going to give a false result

This is setting a trap for anyone else that decides to use isClientTheLeader() in the future.


// If we have a request then we need to check if it can be persisted in case we close the tab while offline.
// We filter persisted requests from the normal Queue to remove duplicates
networkRequestQueue = _.reject(networkRequestQueue, (request) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

src/libs/Network.js Show resolved Hide resolved
@parasharrajat
Copy link
Member Author

parasharrajat commented Aug 3, 2021

If isClientTheLeader() is ever called without .isReady() then it's always going to give a false result

It is not correct. Only on page reload this gives the wrong result (Until Onyx sets the key). for later cases it works just fine.
That's why notifications are not broken in the app on the web.

bump

Sorry I didn't understand.

bump

Same for this.

I want to start a separate convo from the above discussion, and it's specifically about the use of a "fake merge" in order to trigger a callback. I'm concerned that this is a little bit of a "hack" and it's not clear what's really happening. Instead, I think it would be better like this:

Sure I will do that.

@tgolen
Copy link
Contributor

tgolen commented Aug 3, 2021

It is not correct. Only on page reload this gives the wrong result (Until Onyx sets the key).

OK, thanks for explaining. I see I wasn't completely correct, but the trap is still there. For example: if I add code somewhere else that executes on page load, and only reference isClientTheLeader() without using .isReady() then I wouldn't get an accurate result.

Sorry I didn't understand.

Oh, sorry! "bump" is a common term we use that indicates there is an unanswered question that needs to be addressed.

Sure I will do that.

Thanks!

@parasharrajat
Copy link
Member Author

parasharrajat commented Aug 6, 2021

Updated @tgolen.

@parasharrajat
Copy link
Member Author

parasharrajat commented Aug 9, 2021

This is ready for another review. I can move to the callback-based onReady method if pointed to do so.

ATM this PR fixes two issues. Not sure of the prioirty.

@parasharrajat
Copy link
Member Author

parasharrajat commented Aug 11, 2021

🙄 Others please share your valuable feedback.

@parasharrajat
Copy link
Member Author

Sorry, I am being impatient here but see no reason to hold unless you tell me the reason.

Copy link
Contributor

@tgolen tgolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making that change to the callbacks.

We are not really seeing eye-to-eye on isClientTheLeader() being asynchronous.

If @TomatoToaster and @joelbettner are absolutely fine with the current implementation, and they feel they understand it and could fix any race conditions with it (as opposed to just approving this PR because of review-fatigue), then I am OK merging this as-is.

@tgolen
Copy link
Contributor

tgolen commented Aug 11, 2021

All yours @marcaaron

@parasharrajat
Copy link
Member Author

We are not really seeing eye-to-eye on isClientTheLeader() being asynchronous.

I agree with the point. Just finding it difficult to handle in the ONYX.connect callback without too much boilerplate. I am ready to move forward with the current implementation as ClientManager usages do not have any such race condition. Maybe if we have any problems with this we can rethink the approach here.

@marcaaron
Copy link
Contributor

I have to be honest... reviewing this is going to be a bit of a context switch for me at the moment. The changes still seem a bit hard to follow, but to be fair, the persisting of requests did not really fit cleanly into the design when I added it.

That said, I feel like it's very possible we will want to do something like a Phase 2 Refactor for the networking code. And that's probably a good time to think a bit more about how we want this to work and just do it right.

So yeah that's a long way of saying... it feels like persisting requests are in a somewhat broken state and if this fixes them then I'm 👍

@parasharrajat
Copy link
Member Author

Thanks Marc. Not sure what is the next step.

@TomatoToaster
Copy link
Contributor

Yeah I think we can merge this. I can't think of a more perfect way to fix right now this and I agree that we can revisit this on a stage 2 refactor. I'll also sign up to help out on fixing any bugs with these persisted requests going forward 👍🏾 and help out with the eventual refactor since I've got this context at least.

@TomatoToaster TomatoToaster merged commit d6ff523 into Expensify:main Aug 13, 2021
@OSBotify
Copy link
Contributor

✋ This PR was not deployed to staging yet because QA is ongoing. It will be automatically deployed to staging after the next production release.

@OSBotify
Copy link
Contributor

🚀 Deployed to staging by @TomatoToaster in version: 1.0.85-10 🚀

platform result
🤖 android 🤖 success ✅
🖥 desktop 🖥 success ✅
🍎 iOS 🍎 success ✅
🕸 web 🕸 success ✅

@botify
Copy link

botify commented Aug 25, 2021

This has been deployed to production and is now subject to a 7-day regression period.
If no regressions arise, payment will be issued on 2021-09-01. 🎊

@OSBotify
Copy link
Contributor

🚀 Deployed to production by @roryabraham in version: 1.0.86-11 🚀

platform result
🤖 android 🤖 success ✅
🖥 desktop 🖥 success ✅
🍎 iOS 🍎 failure ❌
🕸 web 🕸 success ✅

@botify
Copy link

botify commented Aug 30, 2021

This has been deployed to production and is now subject to a 7-day regression period.
If no regressions arise, payment will be issued on 2021-09-06. 🎊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants